7 research outputs found
Bump hunting with non-Gaussian kernels
It is well known that the number of modes of a kernel density estimator is
monotone nonincreasing in the bandwidth if the kernel is a Gaussian density.
There is numerical evidence of nonmonotonicity in the case of some non-Gaussian
kernels, but little additional information is available. The present paper
provides theoretical and numerical descriptions of the extent to which the
number of modes is a nonmonotone function of bandwidth in the case of general
compactly supported densities. Our results address popular kernels used in
practice, for example, the Epanechnikov, biweight and triweight kernels, and
show that in such cases nonmonotonicity is present with strictly positive
probability for all sample sizes n\geq3. In the Epanechnikov and biweight cases
the probability of nonmonotonicity equals 1 for all n\geq2. Nevertheless, in
spite of the prevalence of lack of monotonicity revealed by these results, it
is shown that the notion of a critical bandwidth (the smallest bandwidth above
which the number of modes is guaranteed to be monotone) is still well defined.
Moreover, just as in the Gaussian case, the critical bandwidth is of the same
size as the bandwidth that minimises mean squared error of the density
estimator. These theoretical results, and new numerical evidence, show that the
main effects of nonmonotonicity occur for relatively small bandwidths, and have
negligible impact on many aspects of bump hunting.Comment: Published at http://dx.doi.org/10.1214/009053604000000715 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A test of mode existence with applications to multimodality
Modes, or local maxima, are often among the most interesting features of a probability density function. Given a set of data drawn from an unknown density, it is frequently desirable to know whether or not the density is multimodal, and various procedures have been suggested for investigating the question of multimodality in the context of hypothesis testing. Available tests, however, suffer from the encumbrance of testing the entire density at once, frequently through the use of nonparametric density estimates using a single bandwidth parameter. Such a procedure puts the investigator examining a density with several modes of varying sizes at a disadvantage. A new test is proposed involving testing the reality of individual observed modes, rather than directly testing the number of modes of the density as a whole. The test statistic used is a measure of the size of the mode, the absolute integrated difference between the estimated density and the same density with the mode in question excised at the level of the higher of its two surrounding antimodes. Samples are simulated from a conservative member of the composite null hypothesis to estimate p-values within a Monte Carlo setting. Such a test can be combined with the graphical notion of a "mode tree," in which estimated mode locations are plotted over a range of kernel bandwidths. In this way, one can obtain a procedure for examining, in an adaptive fashion, not only the reality of individual modes, but also the overall number of modes of the density. A proof of consistency of the test statistic is offered, simulation results are presented, and applications to real data are illustrated
High order data sharpening for density estimation
It is shown that data sharpening can be used to produce density estimators that enjoy arbitrarily high orders of bias reduction. Practical advantages of this approach, relative to competing methods, are demonstrated. They include the sheer simplicity of the estimators, which makes code for computing them particularly easy to write, very good mean-squared error performance, reduced 'wiggliness' of estimates and greater robustness against undersmoothing
New Terrain in the Mode Forest
The mode tree of Minnotte and Scott (1993) provides a valuable method of investigating features such as modes and bumps in a unknown density. By examining kernel density estimates for a range of bandwidths, we can learn a lot about the structure of a data set. Unfortunately, the basic mode tree can be strongly affected by small changes in the data, and gives no way to differentiate between important modes and those caused, for example, by outliers. The mode forest overcomes these difficulties by looking simultaneously at a large collection of mode trees, all based on some variation of the original data, by means such as resampling or jittering. The result is both visually appealing and informative
The Bumpy Road to the Mode Forest
The mode tree of Minnotte and Scott (1993) provides a valuable method of investigating features such as modes and bumps in a unknown density. By examining kernel density estimates for a range of bandwidths, we can learn a lot about the structure of a data set. Unfortunately, the basic mode tree can be strongly affected by small changes in the data, and gives no way to differentiate between important modes and those caused, for example, by outliers. The mode forest overcomes these difficulties by looking simultaneously at a large collection of mode trees, all based on some variation of the original data, by means such as resampling or jittering. The result is both visually appealing and informative